Scraping Audio Files

I have a friend who is very fascinated by languages. Whenever we walk around and hear someone speaking we try to guess what their nationality is based on their accent. I thought maybe it would be cool to see if we could use machine learning to classify and predict accents. So I looked online for data and I came across an archive called the speech accent archive (Weinberger, Steven. (2015). Speech Accent Archive. George Mason University. Retrieved from http://accent.gmu.edu) I then began to write a script to scrape the audio files that I will show below.

In [2]:
# import necessary libraries
import requests
import numpy as np
from bs4 import BeautifulSoup
import os
from os.path import basename
import urllib.request
import subprocess

The speech accent archive has 2869 samples of audio coming from many different types of languages. They collected people saynig the same excerpt so that it can be analyzed. The excerpt is: Please call Stella. Ask her to bring these things with her from the store: Six spoons of fresh snow peas, five thick slabs of blue cheese, and maybe a snack for her brother Bob. We also need a small plastic snake and a big toy frog for the kids. She can scoop these things into three red bags, and we will go meet her Wednesday at the train station. Every sample says this same excerpt.

In [ ]:
#2869 samples
links = []
base_url = 'http://accent.gmu.edu/browse_language.php?function=detail&speakerid='
for i in range(2869):
    links.append(base_url+str(i+1))

Fortunately, all the links have the same base url with just a different speaker id ranging from 1 to 2869.

In [ ]:
audiolist=[]
for j in range(len(links)):
    r = requests.get(links[j])
    raw_html = r.content
    soup = BeautifulSoup(raw_html, 'html.parser')
    a = soup.find_all('source')
    audiolist.append(a[0].get('src'))

In the html there is a source file and to retrieve that we can use Beautiful Soup, a library for finding and extracting html.

In [50]:
from IPython.display import Image
Image(filename="accentsarchive.png")
Out[50]:
In [ ]:
strip_audiolist = []
for k in range(len(audiolist)):
    strip_audiolist.append(''.join([i for i in basename(audiolist[k]) if not i.isdigit()]))

strip_audiolist = np.unique(strip_audiolist)
strip_audiolist = np.delete(strip_audiolist, 0)

c = []
for k in range(len(audiolist)):
    head, sep, tail = basename(audiolist[k]).partition('.')
    c.append(head)

d = []
for k in range(len(strip_audiolist)):
    head, sep, tail = strip_audiolist[k].partition('.')
    d.append(head)
In [ ]:
for i in range(len(d)):
    os.mkdir('D:/AudioFiles/'+d[i])

Next create folders where you would like to save each audio file in their respective class (i.e. which type of language/speaker)

In [ ]:
for k in range(len(audiolist)):
    for l in range(len(strip_audiolist)):
        if d[l] in audiolist[k]:
            urllib.request.urlretrieve(audiolist[k], 'D:/AudioFiles/'+ d[l] + '/' +basename(audiolist[k]))
            subprocess.call(['ffmpeg', '-i', 'D:/AudioFiles/'+ d[l] + '/' +basename(audiolist[k]),
                             'D:/AudioFiles/'+ d[l] + '/' + c[k] +'.wav'])
In [51]:
from IPython.display import Image
Image(filename="intheend.png")
Out[51]:

Using the urllib library we can retrieve the audio files and store them in our desired directories. I also wanted to convert the mp3 into wav files since they are easier to read in python and thus easier to analyze.

In [52]:
from IPython.display import Image
Image(filename="arabic.png")
Out[52]:
In [ ]:
import pandas as pd
In [6]:
wav = []
dir = []
for i in range(len(d)):
    for filename in os.listdir('D:/AudioFiles/' + d[i]):
        if filename.endswith(".wav"):
             wav.append(filename)
             dir.append('D:/AudioFiles/' + d[i] + '/' + filename)


classes = []
for k in range(len(wav)):
    head, sep, tail = wav[k].partition('.')
    classes.append(head)
dups = []
for k in range(len(classes)):
    dups.append(''.join([i for i in classes[k] if not i.isdigit()]))

counts = pd.Series(dups).value_counts()
In [7]:
counts
Out[7]:
english            628
spanish            223
arabic             175
mandarin           132
korean              96
french              80
russian             79
portuguese          65
tagalog             54
macedonian          54
dutch               52
german              42
japanese            41
turkish             41
bengali             40
bulgarian           40
polish              39
italian             38
cantonese           35
farsi               35
hindi               33
vietnamese          33
urdu                27
amharic             27
hungarian           26
romanian            24
swedish             23
thai                21
serbian             19
nepali              17
                  ... 
kabyle               1
homesign             1
lamotrekese          1
rundi                1
sardinian            1
pohnpeian            1
mankanya             1
sarua                1
tetundili            1
fulfuldeadamawa      1
javanese             1
tumbuka              1
charapa-spanish      1
luxembourgeois       1
hindko               1
tokpisin             1
moba                 1
sindhi               1
chichewa             1
teochew              1
kamba                1
mortlockese          1
agni                 1
sicilian             1
serer                1
naxi                 1
rwanda               1
wali                 1
luba-kasai           1
newari               1
Length: 222, dtype: int64

It would be wise to discard the languages with fewer than a certain amount of counts because it would be hard to train a model with so many classes, especially ones without many examples to use. We can also see that the sample is imbalanced with english containing the majority of the records. To start we can limit the amount of classes to 8. Then once we get a decent model we can try to incorporate more classes. The types of languages that will be used are listed below.

In [8]:
c = counts[counts>64]
c
Out[8]:
english       628
spanish       223
arabic        175
mandarin      132
korean         96
french         80
russian        79
portuguese     65
dtype: int64
In [9]:
reduced_dir = []
for i in range(len(dir)):
    for j in range(len(c.index)):
        if c.index[j] in dir[i]:
            reduced_dir.append(dir[i])
In [10]:
len(reduced_dir)
Out[10]:
1486

This has reduced the dataset to 1486 audiofiles and 8 classes: english, spanish, arabic, mandarin, korean, french, russian, and portugese.

Exploratory Data Analysis

There are many many many features in audio. There is a whole field dedicated entirely for it, and fortunately there are also many python libraries built for analyzing audio.

In [4]:
wav = []
dir = []
d= []
for i in range(len(os.listdir('D:/AudioFiles/'))):
    d.append((os.listdir('D:/AudioFiles/')[i]))
for i in range(len(d)):
    for filename in os.listdir('D:/AudioFiles/' + d[i]):
        if filename.endswith(".wav"):
             wav.append(filename)
             dir.append('D:/AudioFiles/' + d[i] + '/' + filename)
In [5]:
import librosa
import librosa.display
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
In [11]:
classes = []
for k in range(len(wav)):
    head, sep, tail = wav[k].partition('.')
    classes.append(head)
dups = []
for k in range(len(classes)):
    dups.append(''.join([i for i in classes[k] if not i.isdigit()]))
counts = pd.Series(dups).value_counts()

c = counts[counts>64]
reduced_dir = []
for i in range(len(dir)):
    for j in range(len(c.index)):
        if c.index[j] in dir[i]:
            reduced_dir.append(dir[i])
In [12]:
wavs = []
for j in range(len(reduced_dir)):
    wavs.append(basename(reduced_dir[j]))
classes = []
for k in range(len(wavs)):
    head, sep, tail = wavs[k].partition('.')
    classes.append(head)
dups = []
for k in range(len(classes)):
    dups.append(''.join([i for i in classes[k] if not i.isdigit()]))

There are many visualization methods for audio such as a spectrogram, chromogram, waveplot, colour map, and tempogram. Below are the different visualizations for an arabic accent audio sample.

In [13]:
y, sr = librosa.load(reduced_dir[0])
plt.figure(figsize=(12, 8))

D = librosa.amplitude_to_db(np.abs(librosa.stft(y)), ref=np.max)
plt.subplot(4, 2, 1)
librosa.display.specshow(D, y_axis='linear')
plt.colorbar(format='%+2.0f dB')
plt.title('Linear-frequency power spectrogram')

# Or on a logarithmic scale

plt.subplot(4, 2, 2)
librosa.display.specshow(D, y_axis='log')
plt.colorbar(format='%+2.0f dB')
plt.title('Log-frequency power spectrogram')

# Or use a CQT scale

CQT = librosa.amplitude_to_db(np.abs(librosa.cqt(y, sr=sr)), ref=np.max)
plt.subplot(4, 2, 3)
librosa.display.specshow(CQT, y_axis='cqt_note')
plt.colorbar(format='%+2.0f dB')
plt.title('Constant-Q power spectrogram (note)')

plt.subplot(4, 2, 4)
librosa.display.specshow(CQT, y_axis='cqt_hz')
plt.colorbar(format='%+2.0f dB')
plt.title('Constant-Q power spectrogram (Hz)')

# Draw a chromagram with pitch classes

C = librosa.feature.chroma_cqt(y=y, sr=sr)
plt.subplot(4, 2, 5)
librosa.display.specshow(C, y_axis='chroma')
plt.colorbar()
plt.title('Chromagram')

# Force a grayscale colormap (white -> black)

plt.subplot(4, 2, 6)
librosa.display.specshow(D, cmap='gray_r', y_axis='linear')
plt.colorbar(format='%+2.0f dB')
plt.title('Linear power spectrogram (grayscale)')

# Draw time markers automatically

plt.subplot(4, 2, 7)
librosa.display.specshow(D, x_axis='time', y_axis='log')
plt.colorbar(format='%+2.0f dB')
plt.title('Log power spectrogram')

# Draw a tempogram with BPM markers

plt.subplot(4, 2, 8)
Tgram = librosa.feature.tempogram(y=y, sr=sr)
librosa.display.specshow(Tgram, x_axis='time', y_axis='tempo')
plt.colorbar()
plt.title('Tempogram')
plt.tight_layout()
plt.show()

We can try plotting power spectrograms for each of the 8 classes to see differences between classes. We can see that there is high inter-class variance and intra-class variance which can make classification very tricky.

In [14]:
y, sr = librosa.load(reduced_dir[0], duration=20)
plt.figure(figsize=(12, 8))
X = librosa.stft(y)
Xdb = librosa.amplitude_to_db(abs(X))
plt.subplot(4, 2, 1)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar(format='%+2.0f dB')
plt.title('Arabic')

y, sr = librosa.load(reduced_dir[1], duration=20)
X = librosa.stft(y)
Xdb = librosa.amplitude_to_db(abs(X))
plt.subplot(4, 2, 2)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar(format='%+2.0f dB')
plt.title('Arabic')

y, sr = librosa.load(reduced_dir[2], duration=20)
X = librosa.stft(y)
Xdb = librosa.amplitude_to_db(abs(X))
plt.subplot(4, 2, 3)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar(format='%+2.0f dB')


y, sr = librosa.load(reduced_dir[3], duration=20)
X = librosa.stft(y)
Xdb = librosa.amplitude_to_db(abs(X))
plt.subplot(4, 2, 4)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar(format='%+2.0f dB')

plt.show()

y, sr = librosa.load(reduced_dir[176], duration=20)
plt.figure(figsize=(12, 8))
X = librosa.stft(y)
Xdb = librosa.amplitude_to_db(abs(X))
plt.subplot(4, 2, 1)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar(format='%+2.0f dB')
plt.title('English')

y, sr = librosa.load(reduced_dir[177], duration=20)
X = librosa.stft(y)
Xdb = librosa.amplitude_to_db(abs(X))
plt.subplot(4, 2, 2)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar(format='%+2.0f dB')
plt.title('English')

y, sr = librosa.load(reduced_dir[178], duration=20)
X = librosa.stft(y)
Xdb = librosa.amplitude_to_db(abs(X))
plt.subplot(4, 2, 3)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar(format='%+2.0f dB')


y, sr = librosa.load(reduced_dir[179], duration=20)
X = librosa.stft(y)
Xdb = librosa.amplitude_to_db(abs(X))
plt.subplot(4, 2, 4)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar(format='%+2.0f dB')

plt.show()

y, sr = librosa.load(reduced_dir[820], duration=20)
plt.figure(figsize=(12, 8))
X = librosa.stft(y)
Xdb = librosa.amplitude_to_db(abs(X))
plt.subplot(4, 2, 1)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar(format='%+2.0f dB')
plt.title('French')

y, sr = librosa.load(reduced_dir[821], duration=20)
X = librosa.stft(y)
Xdb = librosa.amplitude_to_db(abs(X))
plt.subplot(4, 2, 2)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar(format='%+2.0f dB')
plt.title('French')

y, sr = librosa.load(reduced_dir[822], duration=20)
X = librosa.stft(y)
Xdb = librosa.amplitude_to_db(abs(X))
plt.subplot(4, 2, 3)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar(format='%+2.0f dB')


y, sr = librosa.load(reduced_dir[823], duration=20)
X = librosa.stft(y)
Xdb = librosa.amplitude_to_db(abs(X))
plt.subplot(4, 2, 4)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar(format='%+2.0f dB')

plt.show()

y, sr = librosa.load(reduced_dir[890], duration=20)
plt.figure(figsize=(12, 8))
X = librosa.stft(y)
Xdb = librosa.amplitude_to_db(abs(X))
plt.subplot(4, 2, 1)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar(format='%+2.0f dB')
plt.title('Korean')

y, sr = librosa.load(reduced_dir[891], duration=20)
X = librosa.stft(y)
Xdb = librosa.amplitude_to_db(abs(X))
plt.subplot(4, 2, 2)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar(format='%+2.0f dB')
plt.title('Korean')

y, sr = librosa.load(reduced_dir[892], duration=20)
X = librosa.stft(y)
Xdb = librosa.amplitude_to_db(abs(X))
plt.subplot(4, 2, 3)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar(format='%+2.0f dB')


y, sr = librosa.load(reduced_dir[893], duration=20)
X = librosa.stft(y)
Xdb = librosa.amplitude_to_db(abs(X))
plt.subplot(4, 2, 4)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar(format='%+2.0f dB')

plt.show()

y, sr = librosa.load(reduced_dir[990], duration=20)
plt.figure(figsize=(12, 8))
X = librosa.stft(y)
Xdb = librosa.amplitude_to_db(abs(X))
plt.subplot(4, 2, 1)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar(format='%+2.0f dB')
plt.title('Mandarin')

y, sr = librosa.load(reduced_dir[991], duration=20)
X = librosa.stft(y)
Xdb = librosa.amplitude_to_db(abs(X))
plt.subplot(4, 2, 2)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar(format='%+2.0f dB')
plt.title('Mandarin')

y, sr = librosa.load(reduced_dir[992], duration=20)
X = librosa.stft(y)
Xdb = librosa.amplitude_to_db(abs(X))
plt.subplot(4, 2, 3)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar(format='%+2.0f dB')


y, sr = librosa.load(reduced_dir[993], duration=20)
X = librosa.stft(y)
Xdb = librosa.amplitude_to_db(abs(X))
plt.subplot(4, 2, 4)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar(format='%+2.0f dB')

plt.show()

y, sr = librosa.load(reduced_dir[1190], duration=20)
plt.figure(figsize=(12, 8))
X = librosa.stft(y)
Xdb = librosa.amplitude_to_db(abs(X))
plt.subplot(4, 2, 1)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar(format='%+2.0f dB')
plt.title('Russian')

y, sr = librosa.load(reduced_dir[1191], duration=20)
X = librosa.stft(y)
Xdb = librosa.amplitude_to_db(abs(X))
plt.subplot(4, 2, 2)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar(format='%+2.0f dB')
plt.title('Russian')

y, sr = librosa.load(reduced_dir[1192], duration=20)
X = librosa.stft(y)
Xdb = librosa.amplitude_to_db(abs(X))
plt.subplot(4, 2, 3)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar(format='%+2.0f dB')


y, sr = librosa.load(reduced_dir[1193], duration=20)
X = librosa.stft(y)
Xdb = librosa.amplitude_to_db(abs(X))
plt.subplot(4, 2, 4)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar(format='%+2.0f dB')

plt.show()

y, sr = librosa.load(reduced_dir[1150], duration=20)
plt.figure(figsize=(12, 8))
X = librosa.stft(y)
Xdb = librosa.amplitude_to_db(abs(X))
plt.subplot(4, 2, 1)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar(format='%+2.0f dB')
plt.title('Portugese')

y, sr = librosa.load(reduced_dir[1151], duration=20)
X = librosa.stft(y)
Xdb = librosa.amplitude_to_db(abs(X))
plt.subplot(4, 2, 2)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar(format='%+2.0f dB')
plt.title('Portugese')

y, sr = librosa.load(reduced_dir[1152], duration=20)
X = librosa.stft(y)
Xdb = librosa.amplitude_to_db(abs(X))
plt.subplot(4, 2, 3)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar(format='%+2.0f dB')


y, sr = librosa.load(reduced_dir[1153], duration=20)
X = librosa.stft(y)
Xdb = librosa.amplitude_to_db(abs(X))
plt.subplot(4, 2, 4)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar(format='%+2.0f dB')

plt.show()

y, sr = librosa.load(reduced_dir[1290], duration=20)
plt.figure(figsize=(12, 8))
X = librosa.stft(y)
Xdb = librosa.amplitude_to_db(abs(X))
plt.subplot(4, 2, 1)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar(format='%+2.0f dB')
plt.title('Spanish')

y, sr = librosa.load(reduced_dir[1291], duration=20)
X = librosa.stft(y)
Xdb = librosa.amplitude_to_db(abs(X))
plt.subplot(4, 2, 2)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar(format='%+2.0f dB')
plt.title('Spanish')

y, sr = librosa.load(reduced_dir[1292], duration=20)
X = librosa.stft(y)
Xdb = librosa.amplitude_to_db(abs(X))
plt.subplot(4, 2, 3)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar(format='%+2.0f dB')


y, sr = librosa.load(reduced_dir[1293], duration=20)
X = librosa.stft(y)
Xdb = librosa.amplitude_to_db(abs(X))
plt.subplot(4, 2, 4)
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.colorbar(format='%+2.0f dB')

plt.show()

Using librosa we can also extract the mel-frequency cepstral coefficients (MFCC). MFCC values mimic human hearing, and they are commonly used in speech recognition applications. These MFCC values can be fed directly into the neural network.

In [16]:
mels = []
for filename in range(len(reduced_dir)):
    y, sr = librosa.load(reduced_dir[filename], duration=20) #capture the audio time series and the sampling rate and caps it at 20 seconds
    mfcc = librosa.feature.mfcc(y)
    mfcc/= np.amax(np.absolute(mfcc))   # normalize the values
    mels.append(mfcc.flatten())
In [17]:
mfcc_df = pd.DataFrame(mels)
mfcc_df = mfcc_df.assign(label = pd.Series(dups).values)
In [18]:
# need to get rid of some lone subclasses. These were different dialects of the main languages
df = mfcc_df[mfcc_df.label != 'charapa-spanish']
df = mfcc_df[mfcc_df.label != 'haitiancreolefrench']
In [19]:
df = df.drop(df.index[1255])
In [20]:
df['label'] = pd.Categorical(df['label'])
df['label'] = df.label.cat.codes
df = df.fillna(0)

Since the classes are imbalanced, it would be wise to split the train and test sets proportionally. This can be done using sci-kit learn's Stratified Shuffle Split

In [21]:
from sklearn.model_selection import StratifiedShuffleSplit
data = df.drop(columns=['label'])
labels = df.label
split = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=42)
for train_index, test_index in split.split(data,labels):
    X_train, X_test = data.iloc[train_index], data.iloc[test_index]
    y_train, y_test = labels.iloc[train_index], labels.iloc[test_index]
In [22]:
# check the proportions
labels.value_counts(normalize=True)
Out[22]:
1    0.424899
7    0.150880
0    0.118403
4    0.089310
3    0.064953
2    0.054127
6    0.053451
5    0.043978
Name: label, dtype: float64
In [23]:
y_train.value_counts(normalize=True)
Out[23]:
1    0.424704
7    0.150592
0    0.118443
4    0.089679
3    0.065144
2    0.054146
6    0.053299
5    0.043993
Name: label, dtype: float64
In [24]:
y_test.value_counts(normalize=True)
Out[24]:
1    0.425676
7    0.152027
0    0.118243
4    0.087838
3    0.064189
6    0.054054
2    0.054054
5    0.043919
Name: label, dtype: float64

The proportions of the classes for the train and test set are all similar to the original dataset.

In [44]:
%reload_ext tensorboard
In [25]:
from tensorflow import keras
import tensorflow as tf
import datetime
In [45]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(1000, activation = 'relu', input_dim=np.shape(X_train)[1]),
    tf.keras.layers.Dense(600, activation='relu'),
    #tf.keras.layers.Dropout(0.5),
    #tf.keras.layers.Dense(300, activation='relu'),
    #tf.keras.layers.Dropout(0.5),
    #tf.keras.layers.Dense(800, activation='relu'),
    #tf.keras.layers.Dropout(0.5),
    #tf.keras.layers.Dense(500, activation='relu'),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(50, activation='relu'),
    tf.keras.layers.Dense(8, activation = 'softmax')
])
In [46]:
model.compile(tf.keras.optimizers.Adam(learning_rate=0.001),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

log_dir="D:\\logs\\fit\\" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)

history = model.fit(x=X_train,
          y=y_train,
          epochs=20,
          validation_data=(X_test, y_test),
          callbacks=[tensorboard_callback])
WARNING:tensorflow:Falling back from v2 loop because of error: Failed to find data adapter that can handle input: <class 'pandas.core.frame.DataFrame'>, <class 'NoneType'>
Train on 1182 samples, validate on 296 samples
Epoch 1/20
1182/1182 [==============================] - 7s 6ms/sample - loss: 2.0397 - accuracy: 0.3435 - val_loss: 1.7969 - val_accuracy: 0.4257
Epoch 2/20
1182/1182 [==============================] - 6s 5ms/sample - loss: 1.7528 - accuracy: 0.4222 - val_loss: 1.7305 - val_accuracy: 0.4324
Epoch 3/20
1182/1182 [==============================] - 6s 5ms/sample - loss: 1.6240 - accuracy: 0.4433 - val_loss: 1.6840 - val_accuracy: 0.4223
Epoch 4/20
1182/1182 [==============================] - 6s 5ms/sample - loss: 1.5521 - accuracy: 0.4814 - val_loss: 1.6325 - val_accuracy: 0.4392
Epoch 5/20
1182/1182 [==============================] - 6s 5ms/sample - loss: 1.4326 - accuracy: 0.5093 - val_loss: 1.6008 - val_accuracy: 0.4493
Epoch 6/20
1182/1182 [==============================] - 6s 5ms/sample - loss: 1.3417 - accuracy: 0.5237 - val_loss: 1.7623 - val_accuracy: 0.4527
Epoch 7/20
1182/1182 [==============================] - 6s 5ms/sample - loss: 1.2510 - accuracy: 0.5550 - val_loss: 1.7271 - val_accuracy: 0.4324
Epoch 8/20
1182/1182 [==============================] - 7s 6ms/sample - loss: 1.0766 - accuracy: 0.6176 - val_loss: 1.9201 - val_accuracy: 0.4426
Epoch 9/20
1182/1182 [==============================] - 7s 6ms/sample - loss: 0.9982 - accuracy: 0.6523 - val_loss: 1.8778 - val_accuracy: 0.3851
Epoch 10/20
1182/1182 [==============================] - 7s 6ms/sample - loss: 0.7402 - accuracy: 0.7267 - val_loss: 1.9882 - val_accuracy: 0.4291
Epoch 11/20
1182/1182 [==============================] - 6s 5ms/sample - loss: 0.7039 - accuracy: 0.7724 - val_loss: 2.7601 - val_accuracy: 0.4628
Epoch 12/20
1182/1182 [==============================] - 6s 5ms/sample - loss: 0.4711 - accuracy: 0.8503 - val_loss: 2.5178 - val_accuracy: 0.4291
Epoch 13/20
1182/1182 [==============================] - 7s 6ms/sample - loss: 0.2801 - accuracy: 0.9205 - val_loss: 2.5946 - val_accuracy: 0.4426
Epoch 14/20
1182/1182 [==============================] - 7s 6ms/sample - loss: 0.1929 - accuracy: 0.9399 - val_loss: 2.8577 - val_accuracy: 0.4459
Epoch 15/20
1182/1182 [==============================] - 7s 6ms/sample - loss: 0.1153 - accuracy: 0.9704 - val_loss: 3.0916 - val_accuracy: 0.4020
Epoch 16/20
1182/1182 [==============================] - 7s 6ms/sample - loss: 0.0569 - accuracy: 0.9865 - val_loss: 3.2862 - val_accuracy: 0.4189
Epoch 17/20
1182/1182 [==============================] - 7s 6ms/sample - loss: 0.0442 - accuracy: 0.9924 - val_loss: 3.4370 - val_accuracy: 0.4054
Epoch 18/20
1182/1182 [==============================] - 7s 6ms/sample - loss: 0.0452 - accuracy: 0.9865 - val_loss: 3.6807 - val_accuracy: 0.4189
Epoch 19/20
1182/1182 [==============================] - 7s 6ms/sample - loss: 0.0186 - accuracy: 0.9983 - val_loss: 3.7571 - val_accuracy: 0.4527
Epoch 20/20
1182/1182 [==============================] - 7s 6ms/sample - loss: 0.0074 - accuracy: 1.0000 - val_loss: 3.8621 - val_accuracy: 0.4291
In [48]:
#%tensorboard --logdir logs/fit

The accuracy shows overfitting since the validation accuracy does not improve as the training accuracy improves. This means the model's hyperparameters needs to be tuned more, or another form of input data needs to be used. Applied deep learning is a very empirical process. We can do some trial and error using tensorboard.